This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 

BEST AVAILABLE IMAGES 



Defective images within this document are accurate representations of the 
original documents submitted by the applicant. 

Defects in the images may include (but are not limited to): 



BLACK BORDERS 

TEXT CUT OFF AT TOP, BOTTOM OR SIDES 
FADED TEXT 
ILLEGIBLE TEXT 
SKEWED/SLANTED IMAGES 
COLORED PHOTOS 

BLACK OR VERY BLACK AND WHITE DARK PHOTOS 
GRAY SCALE DOCUMENTS 



IMAGES ARE BEST AVAILABLE COPY. 

As rescanning documents will not correct images, 
please do not report the images to the 
Image Problems Mailbox. 




United States Patent [19] 

Gorczyca et al. 



US005822531A 
[ii] Patent Number: 
[45] Date of Patent: 



5,822,531 
Oct. 13, 1998 



[54] METHOD AND SYSTEM FOR 

DYNAMICALLY RECONFIGURING A 
CLUSTER OF COMPUTER SYSTEMS 

[75] Inventors: Robert Gorczyca, Cambridge; Aamir 
Arshad Rashid, Norwood, both of 
Mass.; Kevin Forress Rodgers, Deny, 
N.H.; Stuart Warnsman, Watertown, 
Mass.; Thomas Van Weaver, Dripping 
Springs, Tex. 

[73] Assignee: Internationa] Business Machines 
Corporation, Armonk, N.Y. 

[21] Appl. No.: 681,324 

[22] Filed: Jul. 22, 1996 

[51] Int. CI. 6 G06F 9/445; G06F 9/06 

[52] U.S. C) 395/200.51; 395/653; 707/202 

[58] Field of Search 395/284, 200.62, 

395/827, 830, 712, 182.18, 183.01, 653, 
200.5; 384/DIG. 1, DIG. 2; 379/115, 207, 

34 



[56] References Cited 

U.S. PATENT DOCUMENTS 

5,161,102 11/1992 Griffin et al 385/284 

5,257,368 10/1993 Benson et al 395/827 

Primary Examiner— Daniel H. Pan 

Attorney, Agent, or Firm—Richard A. Henkler; Brian F. 

Russell; Andrew J. Dillon 



[57] 



ABSTRACT 



A method and system for dynamically reconfiguring a 
cluster of computer systems are disclosed. In accordance 
with the present invention, a first configuration file is pro- 
vided at a plurality of computer systems within a cluster that 
specifies a current configuration of the cluster. A second 
configuration file is created at each of the plurality of 
computer systems that specifies a modified configuration of 
the cluster. The modified configuration is then verified. In 
response to the verification, the cluster is operated utilizing 
the modified configuration. In accordance with one 
embodiment, the dynamic reconfiguration of the cluster can 
include a reconfiguration of the cluster topology or 
resources. 

22 Claims, 6 Drawing Sheets 
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METHOD AND SYSTEM FOR As should thus be apparent, an improved method and 

DYNAMICALLY RECONFIGURING A system for reconfiguring a cluster is needed that permits a 

CLUSTER OF COMPUTER SYSTEMS cluster administrator to modify the default configuration of 

a cluster without rendering the system resources and ser- 
5 vices owned by the cluster inaccessible to clients. 

BACKGROUND OF THE INVENTION 

1. Technical Field SUMMARY OF THE INVENTION 

The present invention relates in general to a method and it is therefore one object of the present invention to 
system for data processing and in particular to a method and provide an improved method and system for data process- 
system for configuring a cluster of computer systems. Still 10 ing. 

more particularly, the present invention relates to method It ^ anothcr object of the prcsent mvention to provide a 

and system for dynamically reconfiguring a cluster of com- method system for configurmg a cmste r of computer 

puter systems, which permit the default configuration of the systems 

cluster to be modified without limiting the availability of Ti . A iL . . A , it A . A . . , 

. „ j j if a i . 15 It is yet another object of the present mvention to provide 

system resources and services owned by the cluster. J . , . J _ , : t1 ~ r , A 

. . J a method and system for dynamically reconfiguring a cluster 

2. Description of the Related Art of Qaas ^ systems> whicn permit me defauU confi^on 

Data processing systems are frequently utilized to process 0 f the cluster to be modified without limiting the availability 

data and monitor and manage events in contexts in which 0 f system resources and services owned by the cluster, 

reliability and guaranteed access to system resources are of M The f Qm ^ ^ achieved ^ is mw described 

paramount importance. For example data processing sys- A method md m for d icall reconfi ^ ri ng a 

terns are utilized to manage critical databases, automate duster of tef s ^ ^ m accordaQce 

assembly and production lines and implement complex ^ ^ t inventi a first cormguration file k ro . 

contro systems. Because of the demands of such mission- yided at a luraH of co uter systems within a cluster mat 

cnUcdcomputmg environments, fault tolerant data process- ^ ifies a ^ ation of the cluster A second 

ing systems were deve oped Fault tolerant data processing ^n^tion file is created at each of the plurality of 

systems rely on specialized hardware to detect a hardware tef g ^ ^ a modified ation of 

fault and rapidly incorporate a redundant hardware compo- ^ duster ^ modified is then verified . Iq 

nent into the data processing sy s tem m place of the failed rc tQ ^ verificati ^ cmstcr ^ ated utiUzi 

hardware component. Although fault y 30 ^ modificd confi tioDt In accordancc with one 

ing systems can transparently provide rehable and nearly embodimentt the dynamic rcconfigurat i on 0 f th e chlster can 

instantaneous recovery ^from hardware failures, a high pre- a reconfiguration of the cluster t ol or 

rmum is often paid in both hardware cost and performance resources 

because the redundant components do no processing. m „ 

Furthermore, conventional fault tolerant data processing 35 e ^ove as well as additional objects, features, and 

systems only provide protection from system hardware advantages of the presem invenUon will become apparent in 

failures and do not address software failures, a far more followln g wntten description, 

common source of system down time. BRIEF DESCRIPTION OF THE DRAWINGS 

In response to the need for a data processing system that 

provides both high availability of system resources and 40 The novel features believed characteristic of the invention 

protection from software failures, cluster architecture was are set forth in the appended claims. The invention itself 

developed. A cluster can be defined as multiple loosely however, as well as a preferred mode of use, further objects 

coupled server machines that cooperate to provide client and advantages thereof, will best be understood by reference 

processors with reliable and highly available access to a set to the following detailed description of an illustrative 

of system services or resources. High availability of cluster 45 embodiment when read in conjunction with the accompa- 

resources, which can include both hardware and software nying drawings, wherein: 

such as disks, volume groups, file systems, network FIG. 1 depicts in an illustrative embodiment of a cluster 

addresses, and applications, is ensured by defining takeover multi-processing system, which can be dynamically recon- 

relationships that specify which of the server machines figured in accordance with the present invention; 

comprising a cluster assumes control oyer a group of 50 FIG. 2A-2B depict a layer diagram of the software 

resources after the server machine that originally owned the ^fi^^ of a node the duster illustrated in FIG. 
group of resources relinquishes control due to reconfigura- 

tion of the cluster or failure. ' ^ - * * r. 1 > 

„ A „ . - . . FIG. 3A-3C together form a flowchart of an illustrative 

Cluster conflguration information specifying the server embodiment of a method for d icall reconfiguring a 

machines (hereafter referred to as nodes) belonging to a 55 duster fa ^ the / resent J eatioa; * nd 

cluster and takeover relationships within the cluster have . r 

conventionally been stored within a default conflguration ™G. 4 1S a flowcl «rf illustrating the behavior of a peer 

file that is accessed by configuration utilities, event scripts, node ln res P° nse t0 reccl P' °f * d"*'" reconfiguration event, 

and other programs executed on the various nodes within a DETAILED DESCRIPTION OF PREFERRED 

cluster. Because of the strict dependency of these node 60 EMBODIMENT 
programs on the configuration information stored within the 

default configuration file, the default configuration file uti- With reference now to the figures and in particular with 
lized by a conventional cluster must be unmodifiable while reference to FIG. 1, there is illustrated a cluster 10, which 
the cluster is active. Thus, in order to modify the topology may be dynamically reconfigured in accordance with the 
or resources of a conventional cluster, a cluster administrator 65 present invention. As illustrated, cluster 10 is a multi- 
must take down the cluster, thereby rendering the system processor data processing system including one or more 
services and resources of the cluster inaccessible to clients. clients 12, which each comprise a personal computer or 
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workstation, for example. Each of clients 12 runs a "front 
end" or client application that issues service requests to 
server applications running on nodes 20. A client 12 com- 
municates with other clients 12 and with nodes 20 via one 
of public Local Area Networks (LANs) 14 and 16, which 
can comprise Ethernet, Token- Ring, FDDI or other net- 
works. Communication across public LANs 14 and 16 is 
regulated by a communication protocol such as TCP/IP. 

In order to ensure that cluster resources are highly avail- 
able to clients 12, cluster 10 includes multiple nodes 20, 
which preferably comprise uniprocessor or multi-processor 
data processing systems, such as the RISC System/6000 
available from IBM Corporation. In contrast to fault tolerant 
systems, which include idle redundant hardware, each node 
20 within cluster 10 runs a server or "backend" application, 
which performs tasks in response to service requests issued 
by clients 12. As illustrated, in order to eliminate as many 
single points of failure as possible, each of clients 20 has at 
least two network adapters for each connected network: a 
service network adapter 22 and a standby network adapter 
24. Service network adapter 22, which provides the primary 
connection between a node 20 and one of public LANs 14 
and 16, is assigned an IP address that is published to 
application programs running on clients 12. If a service 
network adapter 22 of one of nodes 20 fails, High Avail- 
ability Cluster Multi-Processing (HACMP) software run- 
ning on the node 20 substitutes the IP address assigned to 
standby network adapter 24 for the IP address assigned to 
the failed service network adapter 22 in order to maintain 
high availability of cluster resources and services. Similarly, 
if a local node 20 is designated to service requests addressed 
to a peer node 20 should the peer node 20 fail, standby 
network adapter 24 of local node 20 is assigned the IP 
address of the service network adapter 22 of peer node 20 
upon failure. In addition to the communication path pro- 
vided by public LANs 14 and 16, nodes 20 can communicate 
across private LAN 26, which provides peer-to-peer com- 
munication between nodes 20, but does not permit access by 
clients 12. Private LAN 26 can be implemented utilizing an 
Ethernet, Token-Ring, FDDI, or other network. 
Alternatively, a serial communication channel such as a 
SCSI-2 differential bus or RS-232 serial line can be utilized. 
^^depicAe,dmclust er# 1 0jtet 
[disfe ftffi^hichfalii^ redundant 
aisfflbgs^sijj^ ^ 45 
[□jU^ ^hicnns itypila ^ 
[Sha£ed|djskg|^ 

assignecnSneTworl^iaapi 
aofiduste rif resourcesiesse' n^ which the archi- 

tecture of cluster 10 makes highly available to clients 12. 
The operation and components of cluster multi-processing 
systems such as cluster 10 are described in greater detail in 
High Availability Cluster Multi-Processing 4.1 for AIX: 
Concepts and Facilities, which is available from IBM Cor- 
poration as Publication No. SC23-2767-01 and is incorpo- 
rated herein by reference. 

Referring now to FIGS. 2A and 2B, software and data 
models of each node 20 within cluster 10 are depicted. The 
illustrated software and data are considered private, in that 
each node 20 individually stores its respective software and 
data internally rather than in shared disks 32. With reference 
first to FIG. 2A, the software model of each node 20 has an 
AIX (Advanced Interactive executive) layer 44 at the lowest 
level, which provides operating system services such as 
memory allocation and thread scheduling. As illustrated, 
AIX layer 44 includes a Logical Volume Manager (LVM) 
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subsystem 40, which manages shared disks 32 at the logical 
level. In addition, a TCP/IP subsystem 42 is provided that 
manages communication between local node 20 and clients 
12 and peer nodes 20 across public LANs 14 and 16 and 
private LAN 26. 

In accordance with the present invention, the software 
model of each node 20 also includes a High Availability 
Cluster Multi-Processing (HACMP) layer 50, which inter- 
acts with AIX layer 44 to provide a highly available com- 
puting environment. HACMP layer 50 includes cluster man- 
ager 52, which has the primary responsibilities of 
monitoring the hardware and software subsystems of cluster 
10, tracking the state of nodes 20, and responding appro- 
priately to maintain the availability of cluster resources in 
response to a change in the state of cluster 10. Cluster 
manager 52 monitors the status of neighboring nodes 20 by 
transmitting "heartbeat" messages to which neighboring 
nodes 20 respond with similar messages. In response to a 
change in the state of cluster 10, for example, a failure of one 
of nodes 20, cluster manager 52 runs appropriate scripts to 
implement the takeover relationships specified within 
default configuration file 70 (illustrated in FIG. 2B). As 
described in greater detail below, cluster manager 52 also 
responds to Dynamic Automatic Reconfiguration Events 
(DAREs), which are initiated by a cluster administrator in 
order to dynamically reconfigure the topology or resources 
of cluster 10. 

The software model illustrated in FIG. 2A further includes 
an application layer 54, which provides services to client 
applications running on clients 12 and itself utilizes services 
provided by AIX layer 44 and HACMP layer 50. Application 
software within application layer 54 is among the cluster 
resources to which HACMP 50 provides highly available 
access. 

Referring now to FIG. 2B, the data model of each active 
node 20 includes an event queue 60, a default configuration 
file 70, and an active configuration file 72. Event queue 60 
is a first in/first out (FIFO) queue that temporarily stores 
events to be processed by HACMP layer 50. Events within 
event queue 60 can include events generated internally by 
cluster manager 52 as well as events received by cluster 
manager 52 from clients 12 and peer nodes 20. Default 
configuration file 70 contains cluster and AIX configuration 
information utilized by HACMP configuration utilities, clus- 
ter daemons such as cluster manager 52, and the event 
scripts executed by cluster manager 52. When cluster dae- 
mons are started on a node 20, the contents of default 
configuration file 70 are copied to active configuration file 
72 for access by all cluster daemons, event scripts, and 
HACMP configuration utilities. As described in greater 
detail below with reference to FIGS. 3A-3C, cluster man- 
ager 52 creates a temporary configuration file 74 in response 
to receipt of a DARE in order to prevent the corruption of 
configuration files 70 and 72 during the implementation of 
the specified configuration modifications. 

With reference now to FIGS. 3A-3C, there is illustrated 
a logical flowchart of an illustrative embodiment of a 
method for dynamically reconfiguring a cluster in accor- 
dance with the present invention. The depicted method is 
performed within a local node 20 of cluster 10 by cluster 
manager 52 and other cluster daemons within HACMP layer 
50. As illustrated, the process begins at block 80 upon the 
activation of local node 20 and thereafter proceeds to block 
82, which illustrates starting cluster manager 52 and other 
daemons on local node 20. The process then proceeds to 
block 84, which depicts HACMP layer 50 copying default 
configuration file 70 to active configuration file 72 in 
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response to the activation of the cluster daemons at block 82. However, in response to a determination at block 202 that a 
As noted above, active configuration file 72 can then be DARE has been received at peer node 20, the process 
accessed by all cluster daemons, event scripts, and HACMP proceeds to block 204, which illustrates peer cluster man- 
configuration utilities running on local node 20. The process ager 52 suspending other cluster operations such as heart- 
proceeds from block 84 to block 86, which illustrates the 5 beat communication. Next, as shown at block 206, peer 
application of user-specified modifications of the cluster cluster manager 52 copies the contents of local default 
configuration to local default configuration file 70. The configuration file 70, which was previously sent to peer node 
contents of local default configuration file 70 is then trans- 20, to peer temporary configuration file 74. The process then 
mitted to all peer nodes 20, as indicated at block 88. Despite terminates at block 210. 

the changes made to local default configuration file 70 at 10 Referring again to block 120 of FIG. 3B, following the 

block 86, no change is reflected in the operation of cluster transmission of the DARE, local cluster manager 52 deter- 

10 because the software running of local node 20 references mines whether or not all peer nodes 20 have suspended 

active configuration file 72 rather than default configuration cluster operations and copied local default configuration file 

file 70. The process then proceeds to block 90, which 70 to peer temporary configuration file 74. If not, the process 

illustrates cluster manager 52 determining whether or not a 15 passes to block 156, which illustrates local cluster manager 

DARE has been received at local node 20. In the depicted 52 aborting the requested dynamic reconfiguration of cluster 

configuration of cluster 10, a DARE can be conveniently 10. Returning to block 122, in response to a determination 

initiated by a cluster administrator through entry of inputs that cluster operations have been suspended by all peer 

via a user interface provided within one of clients 12 that nodes 20 within cluster 10, the process proceeds to block 

indicate that the modifications to default configuration file 20 124, whicKIdegict^^ a 

70 are to be implemented throughout cluster 10. message to p£er^qdes,2^ 

In response to a determination at block 90 that a^lMRE-^^ers"5g 
has not been received, the process iterates at block 90 until MdenSeafr~As 126^in response to a 
such time as a DARE is received. In response to receipt of determination that the reply messages from peer nodes 20 
a DARE at local node 20, the process proceeds from block 25 indicate that all temporary configuration files are identical, 
90 to block 92, which illustrates HACMP layer 50 deter- the process proceeds to block 128. However, if the replies to 
mining whether or not a DARE is currently active within local cluster manager 52 indicate that not all temporary 
cluster 10. If so, the DARE received at block 90 is aborted, configuration files 74 are identical, the reconfiguration pro- 
as illustrated at block 94. Hie serialization of DAREs in this cess is aborted, as depicted at block 156. 
manner prevents a cluster-wide configuration corruption, 30 Returning to block 128, local cluster manager 52 then 
which could otherwise result from the simultaneous appli- transmits a message requesting that peer cluster managers 52 
cation of conflicting cluster configurations. In response to a set the state of components common to both the new and old 
determination at block 92 that another DARE is not cur- cluster topologies to the state specified in the old configu- 
rently pending, the process proceeds to blocks 96, which ration. For example, the state of components in the old 
depicts HACMP layer 50 of local node 20 validating the 35 configuration are set to active and the state of newly added 
configuration information within the DARE. Next, the pro- components is set to "offline" or inactive. The process then 
cess proceeds to block 98, which illustrates a determination proceeds to block 130, which illustrates a determination 
of whether or not the DARE specifies a reconfiguration of whether the request made at block 128 has been completed 
the topology of cluster 10, for example, by adding or by all peer nodes 20. If not, the process proceeds to block 
deleting a node, network, or adapter. If so, the process passes 40 156 in the manner which has been described. However, in 
through page connector A to block 120 of FIG. 3B. The response to a determination that the state of the components 
determination illustrated at block 98 can be made, for common to the new and old configurations have been set as 
example, by calculating and comparing separate checksums requested, the process proceeds to block 140, which illus- 
of the cluster topology information specified in the DARE trates local cluster manager 52 transmitting a message 
and that formerly stored within default configuration file 70. 45 requesting that peer cluster managers 52 set the state of 
In accordance with this embodiment, the checksum is a components which are members of only the new or old 
variable-length bit stream, which is calculated by cluster topology to offline. Following a successful verification at 
manager 52 in a specified manner that yields diverse check- block 142 that the request made at block 140 was fulfilled by 
sums for different cluster configurations. all peer nodes 120, the process proceeds to block 144. 

Referring now to block 120 of FIG. 3B, following a 50 Block 144 depicts cluster manager 52 initiating heartbeat 

determination at block 96 that the DARE specifies a recon- communication with newly added components of cluster 10. 

figuration of the topology of cluster 10, cluster manager 52 The process proceeds from block 144 to block 146, which 

requests and verifies that cluster managers 52 on peer nodes illustrates a determination of whether or not the heartbeat 

20 all process the DARE request in a number of synchro- messages issued at block 144 were answered by the newly 

nized sequential steps. Thus, as depicted at block 120, 55 added components of cluster 10. If not, the reconfiguration 

cluster manager 52 on local node 20 first transmits a DARE of cluster 10 is aborted at block 156 in the manner which has 

to peer cluster managers 52. been described. However, if local cluster manager 52 

Referring now to FIG. 4, there is depicted a flowchart receives a return heartbeat message from each of the newly 

illustrating the method in which a peer node 20 within added components, the process passes from block 146 to 

cluster 10 responds to request to the DARE transmitted by 60 block 148, which illustrates cluster manager 52 transmitting 

local node 20 at block 120 of FIG. 3B. As illustrated, the a message requesting that peer cluster managers 52 copy 

process begins at block 200 in response to receipt of an event their respective temporary configuration file 74 to active 

and thereafter proceeds to block 202, which illustrates a configuration file 72. The process then proceeds to block 

determination by peer cluster manager 52 whether or not a 150, which depicts a determination of whether or not the 

DARE has been received. If not, the process passes to block 65 overwriting of active configuration file 72 has been com- 

208, which illustrates peer node 20 processing the event pleted by all peer nodes 20 within cluster 10. If not, the 

normally. Thereafter, the process terminates at block 210. reconfiguration process is aborted at block 156. However, if 
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the modification of active configuration file 72 has been ager 52 whether or not all peer nodes 20 have completed the 
accomplished by all peer nodes 20, the process passes to request depicted at block 186. If not, the reconfiguration 
block 152, which illustrates local node 20 transmitting a process is aborted at block 194. However, if all peer nodes 
message to peer nodes 20 requesting that peer nodes 20 20 have overwritten their respective active configuration file 
resume cluster operations such as heartbeat communication 5 72 with the new cluster configuration, the process passes to 
and failure recovery. Following the execution of the step block 190, which illustrates local cluster manager 52 trans- 
illustrative at block 152, nodes 20 are again available to mitring a message to peer nodes 20 requesting that peer 
process DAREs. Thereafter, the process terminates at block cluster managers 52 resume other cluster operations. The 
154. process then terminates at block 192, 

Returning to block 98 of FIG. 3A, in response to a 10 M has becn described » the present invention provides a 

determination that the DARE does not specify a ieconfigu- method aad s y stem for dynamically reconfiguring a cluster 

ration of the topology of cluster 10, the process passes to °{ systems which permit uninterrupted provision 

block 99, which depicts a determination of whether of not of .f » service to clients within the cluster. In accordance 

the DARE specifies a reconfiguration of cluster resources. If ™ th the Present invention updates to the configuration files 

t 4 ■ . * t_i 1 in« tt t of peer nodes wimm the cluster are performed in a synchro- 

not, the process terminates at block 100. However, if a 15 ^ mann6r in order fo ^ ^ a 

determination is made at block 99 that the DARE specifies tion fa maintained across M nodes within the 

a reconfiguration of cluster resources, the process proceeds cluster 

through page connector B to block 170 of FIG. 3 C. Block „ ' iL . .... ^. i 1 , , 

mi * * i i i * «<* * While the invention has been particularly shown and 

170 illustrates local cluster manager 52 transmitting a , - , r A . r , , *. , . A .„ 

T^ Artr 4 j m a -ii * T j * li 1 , t described with reference to a preferred embodiment, it will 

DARE to peer nodes 20. As illustrated at block 172. local 20 L j * a. 1 n j ■ A i_ ^ ,l * • V 

1 * _ m .u ^ . • c 1 be understood by those skilled in the art that various changes 

cluster manager 52 then determmes from reply messages . c , , / , , At . .... 1 ? 

whether or not peer nodes 20 have suspended other cluster f f °™ "nd detad may be made therem without departing 

operations and copied local default configuration rile 70 to fr .T th , e fP mt md ^ f the k mV ? n ; i!T P -.l' 

. r . .. ,., „ although the present invention has becn described with 

peer temporary configuration file 74. If not, the process " v ^ u _ . —TV"*" " ,iu 

passes to block 194, which illustrates local cluster manager « ^J 8 ™* .° an .^^f pediment m which software 

52 aborting the requested dynamic reconfiguration of The ^P^f^S the P™«« is stored within and 

resources of cluster 10. However, in response to a determi- ***** ° n * ^Vmxaang cluster such software can 

nation that all peer nodes 20 have responded to the DARE ^.™tively »>e embodied as a computer program product 

ac iiwr-.t^ ;JT err- a *u a ^ „L afl ^ , rt ui^i7^ residing within recordable media such as magnetic and 

as illustrated in FIG. 4, the process proceeds to block 174, , & ... . . . . ,. , * t , 

-*i ii* * optical disks or transmission media such as digital and 

which depicts local cluster manager 52 transmitting a mes- 30 F T * 7T 1 ^ 

j^a *• *l 7Z\. . analog communications links, 

sage to peer nodes 20 requesting verification that the tern- W1 f . . . , . 

porary configuration files 74 of all peer nodes 20 are 1 a K C .^ C r^j ■ « n 1 . 

identical. As illustrated at block 176, in response to a . \ £ meth ^ ° f d yf^ c ^ reconfiguring a cluster 

determination that the return messages from peer nodes 20 ™^f a * ° f COmpuU " SySt6mS ' mCthod 

indicate that all temporary configuration files are identical, 35 com P risin S- 

the process proceeds to block 178. providing a first configuration file at each of a plurality of 

tn 1 110 '11 * 4 1 11* C i < t computer systems within a cluster, said first confijm- 

Block 178 illustrates local cluster manager 52 transmit- m * • • c c -j 

. . * M ration file containing a current configuration of said 

ting a message requesting that peer cluster managers 52 cluster- 

release any cluster resources owned by the peer node 20 c us er, 

under the old configuration, but not under the new configu- 40 crea ^ a £ second configuration file at each of said phi- 
ration. In accordance with the request depicted at block 178, ^ of computer systems, wherein said second con- 
peer nodes 20 discontinue the provision of system services figuration file specifies a modified configuration of said 
that require that the use of network adapters, application f\ ster > said modlfied configuration including a modi- 
programs, and other cluster resources no longer owned by fied cluster topology; 

that peer node 20, A determination is then made by local 45 verifying said modified configuration, wherein verifying 
cluster manager 52 whether or not all peer nodes 20 affected said modified configuration includes establishing corn- 
by the request issued at block 178 have complied with the munication with each computer system within said 
request. If not, the process passes to block 194, which modified configuration and not within said current 
illustrates cluster manager 52 aborting the resource recon- configuration; and 

figuration of cluster 10. If, on the other hand, all peer nodes 50 in response to said verification, operating said cluster 
20 to which the request issued at block 178 is applicable utilizing said modified configuration, 
have released the indicated resources, the process proceeds 2. The method of dynamically reconfiguring a cluster of 
from block 180 to block 182. Block 182 depicts cluster claim wherein said steps of creating, verifying, and 
manager 52 transmitting a message requesting that each peer operating are synchronized between said plurality of corn- 
node 20 acquire any cluster resources owned by the peer 55 pu* er systems. 

nodes 20 only under the new configuration. As before, if the 3. The method of dynamically reconfiguring a cluster of 

request depicted at block 182 is not completed by all claim 1, and further comprising: 

applicable peer nodes 20, the process proceeds from block determining whether said creating and verifying steps 

184 to block 194 where the resource reconfiguration is were successfully completed by all of said plurality of 

aborted. However, following a verification at block 184 that 60 computer systems; and 

the request depicted at block 182 has been completed by all in response to a failure of any of said plurality of 

peer nodes 20, the process proceeds to block 186, which computer systems to complete said creating and veri- 

illustrates local cluster manager 52 transmitting a message fying steps, aborting said dynamic reconfiguration of 

requesting that peer cluster managers 52 copy their respec- said cluster. 

live temporary configuration file 74 to active configuration 65 4. A method of dynamically reconfiguring a cluster 

file 72. Next, the process proceeds from block 186 to block including a plurality of computer systems, said method 

188, which illustrates a determination by local cluster man- comprising: 
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providing a first configuration file at each of a plurality of 
computer systems within a cluster, said first configu- 
ration file containing a current configuration of said 
cluster; 

creating a second configuration file at each of said plu- 
rality of computer systems, wherein said second con- 
figuration file at each of said plurality of computer 
systems specifies a modified configuration including a 
diverse set of cluster resources than specified by said 
first configuration file; 
verifying said modified configuration; and 
in response to said verification, operating said cluster 
utilizing said modified configuration. 

5. The method of dynamically reconfiguring a cluster of 
claim 4, wherein said step of verifying said modified con- 
figuration comprises verifying ownership by said plurality of 
computer systems of selected resources accessible under 
said modified configuration. 

6. The method of dynamically reconfiguring a cluster of 
claim 4, wherein said steps of creating, verifying, and 
operating are synchronized between said plurality of com- 
puter systems. 

7. The method of dynamically reconfiguring a cluster of 
claim 4, and further comprising: 

determining whether said creating and verifying steps 
were successfully completed by all of said plurality of 
computer systems; and 

in response to a failure of any of said plurality of 
computer systems to complete said creating and veri- 
fying steps, aborting said dynamic reconfiguration of 
said cluster. 

8. A cluster data processing system, comprising: 

a plurality of computer systems, each of said plurality of 
computer systems including a memory; 

a communications link connecting each of said plurality 
of computer systems with at least one other of said 
plurality of computer systems for communication; 

a first configuration file stored within memory at each of 
said plurality of computer systems, said first configu- 
ration file containing a current configuration of said 
cluster; and 

a cluster manager stored within said memory of at least 
one of said plurality of computer systems and execut- 
able by at least one of said plurality of computer 
systems, wherein said cluster manager creates a second 
configuration file at each of said plurality of computer 
systems, said second configuration file specifying a 
modified configuration of said cluster, and wherein said 
cluster manager verifies said modified configuration of 
said cluster, and in response to said verification, oper- 
ates said cluster utilizing said modified configuration. 

9. The cluster a data processing system of claim 8 wherein 
said second configuration file created at each of said plu- 
rality of computer systems by said cluster manager specifies 
a modified cluster topology. 

10. The cluster data processing system of claim 9, wherein 
said cluster manager verifies said modified configuration by 
establishing communication with each computer system that 
is within said modified configuration and not within said 
current configuration, 

11. The cluster data processing system of claim 8, wherein 
said second configuration file created at each of said plu- 
rality of computer systems by said cluster manager specifes 
a diverse set of cluster resources than said first configuration 
file. 

12. The cluster data processing system of claim U, 
wherein said cluster manager verifies said configuration by 
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verifying ownership by said plurity of computer systems of 
selected resources accessable under said modified configu- 
ration. 

13. The cluster data processing system of claim 8, wherein 
5 said cluster manager synchronizes creation of said second 
configuration file, verification of said modified 
configuration, and operation of said cluster utilizing said 
modified configuration between said plurity of computer 
systems. 

10 14 . The cluster data processing system of claim 8, wherein 
said cluster manager aborts said dynamic reconfiguration of 
said cluster in response to a failure of any of said plurity of 
computer systems to successfully complete creation of said 
second configuration file and verification of said modified 

15 configuration. 

15. A program product for dynamically reconfiguring a 
cluster including a plurality of computer systems, said 
program product comprising: 

a cluster manager executable by at least one of a plurality 
20 of computer systems within a cluster, each of said 
plurality of computer systems within said cluster hav- 
ing a first configuration file containing a current con- 
figuration of said cluster, wherein said cluster manager 
creates a second configuration file at each of said 
25 plurality of computer systems, said second configura- 
tion file specifying a modified configuration including 
a modified cluster topology, and wherein said cluster 
manager verifies said modified configuration of said 
cluster, and in response to said verification, operates 
30 said cluster utilizing said modified configuration; and 
an information bearing media bearing said cluster man- 
ager. 

16. The program product of claim 15, wherein said cluster 
35 manager verifies said modified configuration by establishing 

communication with each computer system that is within 
said modified configuration and not within said current 
configuration, 

17. The program product of claim 15, wherein said cluster 
^ manager synchronizes creation of said second configuration 

file, verification of said modified configuration, and opera- 
tion of said cluster utilizing said modified configuration 
between said plurality of computer systems. 

18. The program product of claim 15, wherein said cluster 
45 manager aborts dynamic reconfiguration of said cluster in 

response to a failure of any of said plurality of computer 
systems to successfully complete creation of said second 
configuration file and verification of said modified configu- 
ration. 

50 19. A program product for dynamically reconfiguring a 
cluster including a plurality of computer systems, said 
program product comprising: 

a cluster manager executable by at least one of a plurality 
of computer systems within a cluster, each of said 
55 plurality of computer systems within said cluster hav- 
ing a first configuration file containing a current con- 
figuration of said cluster, wherein said cluster manager 
creates a second configuration file at each of said 
plurality of computer systems, said second configura- 
60 tion file specifying a modified configuration including 
a diverse set of cluster resources than specified by said 
first configuration file, and wherein said cluster man- 
ager verifies said modified configuration of said cluster, 
and in response to said verification, operates said 
65 cluster utilizing said modified configuration; and 

an information bearing media bearing said cluster man- 
ager. 
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20. The program product of claim 19, wherein said cluster 
manager verifies said modified configuration by verifying 
ownership by said plurality of computer systems of selected 
resources accessible under said modified configuration. 

21. The program product of claim 19, wherein said cluster 
manager synchronizes creation of said second configuration 
file, verification of said modified configuration, and opera- 
tion of said cluster utilizing said modified configuration 
between said plurality of computer systems. 
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22. The program product of claim 19, wherein said cluster 
manager aborts dynamic reconfiguration of said cluster in 
response to a failure of any of said plurality of computer 
systems to successfully complete creation of said second 
configuration file and verification of said modified configu- 
ration. 
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